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DECENTRALIZED  DETECTION  BY  A  LARGE  NUMBER  OF  SENSORS1 


John  N.  Tsitsiklis* 


ABSTRACT 

We  consider  the  decentralized  detection  problem,  in  which  a  number  N  of  identical  sensors  transmit 
a  finite-valued  function  of  their  observations  to  a  fusion  center  which  then  decides  which  one  of  M 
alternative  hypotheses  is  true.  Wlfc  consider  the  case  where  the  number  of  sensors  tends  to  infinity. 
We  then  show  that  it  is  asymptotically  optimal  to  divide  the  sensors  into  M(M  -  l)/2  groups,  with 
all  sensors  in  each  group  using  the  same  decision  rule  in  deciding  what  to  transmit.  We  also  show 
how  the  optimal  number  of  sensors  in  each  group  may  be  determined  by  solving  a  mathematical 
programming  problem.  For  the  special  case  of  two  hypotheses  and  binary  messages  the  solution 
simplifies  considerably:  it  is  optimal  (asymptotically,  as  N  — *  oo)  to  have  each  sensor  perform  an 
identical  likelihood  ratio  test  and  the  optimal  threshold  is  very  easy  to  determine  numerically. 
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The  (static)  decentralized  detection  problem  is  defined  as  follows.  There  are  M  hypotheses 
H i ,  •  •  • .  Hm,  with  known  prior  probabilities  P(ffj)  >  0  and  N  sensors.  Let  Y  be  a  set  endowed  with 
a  (7-field  7  of  measurable  sets.  Let  y,,  »  =  1, ...,  AT,  the  observation  of  the  i-th  sensor,  be  a  random 
variable  taking  values  in  Y .  We  assume  that  the  y,’s  are  conditionally  independent  and  identically 
distributed,  given  either  hypothesis,  with  a  known  conditional  distribution  P(y\Hj),  j  =  1, 

Let  D  be  a  positive  integer.  Each  sensor  t  evaluates  a  D-valued  message  €  {1,...,D},  as  a 
function  of  its  own  observation;  that  is  =  7,(y*),  where  the  function  7 i  :  Y  —*  {l,  ...,D}  is  the 
decision  rule  of  sensor  t  and  is  assumed  to  be  a  measurable  function.  The  messages  ui,...,uw 
are  all  transmitted  to  a  fusion  center  which  declares  one  of  the  hypotheses  to  be  true,  based  on  a 
decision  rule  70  :  {1, ...,  D}N  *—  {1, ...,  M).  That  is,  the  final  decision  uq  of  the  fusion  center  is  given 
by  u0  =  7o(«i,  ...,uw).  The  objective  is  to  choose  the  decision  rules  70, 7i  ,  ...,7/v  of  the  sensors 
and  the  fusion  center  so  as  to  minimize  the  probability  of  error  in  the  decision  of  the  fusion  center. 
(An  alternative  formulation  of  the  problem,  of  the  Neyman-Pearson  type  will  be  also  considered 
in  the  last  section.) 

The  above  defined  problem  and  its  variants  have  been  the  subject  of  a  fair  amount  of  recent 
research  [TS,  E,  TA,  LS],  especially  for  the  case  of  binary  hypotheses  (Af  =  2)  and  binary  messages 
(D  =  2).  For  the  latter  case,  it  is  known  that  any  optimal  set  of  decision  rules  has  the  following 
structure.  Each  one  of  the  sensors  evaluates  its  message  u*  using  a  likelihood  ratio  test  with  an 
appropriate  threshold.  Then,  the  fusion  center  makes  its  decision  by  performing  a  final  likelihood 
ratio  test.  (Here,  the  messages  received  by  the  center  play  the  role  of  its  observations.)  Without 
the  conditional  independence  assumption  we  introduced,  this  result  fails  to  hold  and  the  problem  is 
intractable  (NP-hard),  even  for  the  case  of  two  sensors  [TA].  Assuming  conditional  independence, 
the  optimal  value  of  the  threshold  of  each  sensor  may  be  obtained  by  finding  all  solutions  of  a  set  of 
coupled  algebraic  equations  (which  are  the  person-to-person  optimality  conditions  for  this  problem) 
and  by  selecting  the  solution  which  results  to  least  cost.  Unfortunately  (and  contrary  to  intuition), 
even  if  the  observations  of  each  sensor  are  identically  distributed  (given  either  hypothesis)  it  is 


not  true  that  all  sensors  should  use  the  same  threshold  (see  the  Appendix  for  an  example).  This 
renders  the  computation  of  the  optimal  thresholds  intractable,  when  the  number  of  sensors  is  large. 
To  justify  this  last  claim,  consider  what  is  involved  in  just  evaluating  the  cost  associated  to  a  fixed 
set  Tfo.Tfii  ...,7/v,  of  decision  rules  if  each  sensor  uses  a  different  threshold.  In  order  to  evaluate 
the  expected  cost,  we  have  to  perform  a  summation  over  all  possible  values  of  (ult...,u^),  which 
means  that  there  are  2N  terms  to  be  summed.  (This  is  in  contrast  to  the  case  of  equal  thresholds 
in  which  the  u,’s  are  identically  distributed  and  therefore  the  binomial  formula  may  be  used  to 
obtain  a  sum  with  only  N  +  1  summands.)  Of  course,  to  determine  an  optimal  set  of  decision  rules 
this  effort  may  have  to  be  repeated  a  number  of  times.  This  suggests  that  the  computational  effort 
grows  exponentially  with  the  number  N  of  sensors. 

The  above  discussion  motivates  the  main  results  of  this  paper  which  show  that,  for  the  case 
Af  =  2,  D  =  2,  it  is  asymptotically  optimal  to  have  each  sensor  use  the  same  threshold  and 
provides  a  simple  method  for  computing  the  optimal  threshold.  For  the  general  case  of  Af  >  2 
hypotheses,  it  is  no  longer  true,  not  even  in  the  limit  as  N  — ►  oo,  that  each  sensor  should  use 
the  same  decision  rule.  Nevertheless,  we  show  that,  as  N  — »  oo,  at  moat  Af  (Af  -  l)/2  different 
decision  rules  need  to  be  used  by  the  sensors.  The  determination  of  an  asymptotically  optimal  set 
of  decision  rules  is  still  a  hard  computational  problem,  except  for  the  case  where  the  observation 
set  Y  is  finite  and  of  small  cardinality. 

Notation:  Throughout,  Pi  will  stand  for  the  (conditional)  measure  P(-\Ht)  on  (Y,T),  under 
hypothesis  Hi.  Furthermore,  £*[•]  will  stand  for  expectation  with  respect  to  the  measure  P,. 

II.  THE  BAYESIAN  PROBLEM. 

We  start  by  noticing  that,  having  fixed  the  decision  rules  7i,...,7 n  of  the  sensors,  the  optimal 
decision  for  the  fusion  center  is  determined  by  using  the  maximum  a  posteriori  probability  (MAP) 
rule.  (The  messages  to  the  fusion  center  are  to  be  thought  as  measurements  available  to  it.)  Thus, 
7o  is  straightforward  to  determine  in  terms  of  71,...,  7*.  For  this  reason,  we  shall  be  concerned  only 
with  the  optimization  with  respect  to  (71,  ...,7/v).  Any  such  set  of  decison  rules  will  be  denoted, 


for  convenience,  by  7*. 

We  introduce  some  more  notation.  Let  T  be  a  set  of  decision  rules  among  which  the  decision 
rules  of  each  sensor  are  to  be  selected.  In  general,  we  should  take  T  to  be  the  set  of  all  (measurable) 
functions  from  Y  into  the  set  {1, ...,  D}.  However,  we  may,  for  some  reason,  wish  to  restrict  to  a 
smaller  class  of  decision  rules,  possibly  having  some  simplifying  structure.  We  return  to  this  issue 
in  Section  III.  Let  TN  be  the  Cartesian  product  of  T  with  itself,  N  times.  For  any  7*  €  TN ,  let 
Js{lN)  be  the  probability  of  an  erroneous  final  decision  by  the  fusion  center  (always  assuming 
that  the  fusion  center  uses  the  MAP  rule).  We  are  concerned  with  the  minimisation  of  Jn(in), 
over  all  7*  €  T*,  when  N  is  very  large. 

It  is  easy  to  show  that,  as  the  number  of  sensors  grows  to  infinity,  the  probability  of  error  goes 
to  sero,  for  any  reasonable  set  of  decision  rules,  in  fact  exponentially  fast.  Consequently,  we  need 
a  more  refined  way  of  comparing  different  seta  of  decision  rules,  as  N  — *  00.  To  this  effect,  for 
any  given  value  of  N  and  any  set  7*  of  decision  rules  for  the  N -sensor  problem,  we  consider  the 
exponent  of  the  error  probability  defined  by 

r"(l  > - if — • 

Let  RN  =  inf^wgr*  rs{lN)  be  the  optimal  exponent.  Let  T*  be  the  set  of  all  7*  €  TN  with 
the  property  that  the  set  (71,..., 7*}  has  at  most  M(M  -  l)/2  different  elements.  Let  Qn  = 
inf7*€rgr  rAr(7Ar)  be  the  optimal  exponent,  when  we  restrict  to  ssts  of  decision  rules  in  The 
following  result  shows  that,  asymptotically,  optimality  is  not  lost,  if  we  restrict  to 
Theorem  1:  Subject  to  Assumption  1  below,  lim*_ao(<?*  -  RN)  =  0. 

The  rest  of  this  section  is  devoted  to  the  proof  of  Theorem  1.  We  first  need  to  introduce  some 
auxiliary  tools. 

Let  us  fix  some  7  €  T.  The  mapping  from  the  true  hypothesis  H%  to  the  decision  of  a  sensor 
employing  the  decision  rule  7  may  be  thought  of  as  a  noisy  channel  which  is  completely  described 
by  the  probabilities 

P7(<0  =  Pi(l(v)  =  <0 

4 


The  ability  of  such  a  channel  to  discriminate  between  hypotheses  Ht  and  H}  (i  #  j)  may  be 
quantified  by  a  function  Mtj (7, a)i  *  €  [0,  l|,  defined  by  the  following  formula  [SGB]: 

*,(->,.) = i««  Iewmj'-wm)*  ■  (i) 

We  use  here  the  convention  0°  =  0;  thus,  the  summation  in  (1)  is  to  be  performed  only  over  those 
d’s  for  which  p?(d)p](d)  ^  0.  Assuming  that  p,, (7,#)  is  not  infinite,  it  is  easy  to  see  that  p*,(7,s), 
is  infinitely  differentiable,  as  a  function  of  s,  and  its  derivatives  are  continuous  on  [0, 1],  provided 
that  we  define  the  derivative  at  an  endpoint  as  the  limit  when  we  approach  the  endpoint  from  the 
interior. 

Notice  that,  for  any  fixed  7,  the  function  ^(7,*)  is  equal  to  E[e,x],  where  X  is  the  log- 
likelihood  ratio  of  the  distributions  p’J(-)  and  wb«re  the  expectation  is  with  respect  to  the 

distribution  p^().  As  is  well-known,  minimising  the  characteristic  function  of  a  random  variable 
X  yields  tight  bounds  on  the  probability  of  large  deviations  of  X  from  its  mean.  Since  in  this  case 
X  is  the  log-likelihood  ratio,  this  method  leads  to  tight  bounds  on  the  probability  of  error.  One 
particular  such  result  that  we  will  use  is  taken  from  [SGB]: 

Lemma  1:  Let  there  be  two  hypotheses  H'  and  H".  Let  Zi,  y  be  measurements  taking  values 
in  a  finite  set  {1, ...,  D),  which  are  conditionally  independent  given  the  true  hypothesis  and  suppose 
that  the  conditional  distribution  of  z,,  when  H  is  true,  is  described  by  pj*(d)  =  P(*«  —  d\  H).  Let 

=  lot 

.4*1 

and  p(a)  =  £,«iP(»i*)-  Assume  that  /i(i,«),  p'(i,s),  p"(i,»)  exist  and  are  finite,  where  a  prime 
stands  for  differentiation  with  respect  to  s.  Let  s*  minimize  p(s),  over  s  €  [0,  l].  Then, 

a)  There  exists  a  decision  rule  for  deciding  between  H '  and  H" ,  on  the  basis  of  the  measurements 
*i,...,*JV,  for  which 

P(decide  H'  \  H"  is  true)  7-  P(decide  H"  |  H'  is  true)  <  2exp{^(**)}. 

b)  For  any  rule  for  deciding  between  H'  and  H",  on  the  basis  of  the  measurements  zi,...,z/v,  we 
have 

P(decide  H'  |  H"  is  true)  +  P(decide  H"  |  H'  is  true)  >  5  exp{p(#*)  -  [2p"(#*)]l/1}, 


where  a  prime  indicates  differentiation  with  respect  to  a. 

Proof:  Part  (a)  of  the  Lemma  ia  the  Corollary  in  p.84  of  [SGBj.  For  part  (b),  it  is  shown  in  [SGB] 
(equation  (3.42),  p.87)  that 


P(decide  H'  j  H"  is  true)  +  P(decide  H"  j  H'  ia  true)  > 

^  exp{/i(s)  -  an'(a)  -  a[2^"(a)\l/i)  +  ^  exp {fi(a)  +  (1  -  a)n'(a)  -  (1  -  s)(2/i"(«)J1/s},  Vs  €  (0, 1). 

If  s*  €  (0, 1),  we  have  /i;(s*)  =  0  and  the  desired  result  follows  immediately.  If  a*  =0,  we  may 
take  the  limit  in  the  above  inequality,  as  a  [  0.  Since  n"  is  continuous,  and  therefore  bounded,  we 
have  lim(io  #p"(e)  =  0,  which  yields 

P(decide  H'  \  H"  is  true)  +  P(decide  H"  \  H'  is  true)  >  ^  exp{/i(0)}  >  exp{p(0)  -  ^^"(O*)]1^. 

m 

The  last  inequality  follows  because  n  is  convex  and  therefore  n"(a)  >  0,  Va.  The  argument  for  the 
case  a*  =  1  is  identical.  • 

The  bounds  of  parts  (a)  and  (b)  of  the  Lemma  could  be  far  apart  if  n"  is  left  uncontrolled.  For 
this  reason  we  introduce  the  following  assumption: 

Assumption  1:  a)  |m,,(7,  e)|  <  oo,  V7  €  T,  V»  jk  j,  Va  €  [0, 1). 

b)  There  exists  a  constant  A  such  that  ^"(7,  a)|  <  A,  Va  €  [0,  l],  V7  6  T,  V»  /  j. 

The  content  of  this  Assumption  is  explored  in  Section  VI;  it  is  shown  there  that  it  corresponds 
to  some  minor  restrictions  on  the  distribution  of  the  observations,  which  are  satisfied  in  typical 
situations  of  practical  interest. 


As  a  preview  of  the  remainder  of  the  proof,  we  use  Lemma  1,  for  each  pair  of  distinct  hypotheses 
to  argue  that  the  decision  rules  71,  ...,7/v  of  the  sensors  should  be  chosen  so  as  to  minimize 

N 

max  min  }  Mtj(7*»*)- 

(m):  *#>>•€ lo.ilfrf  3 


{(• 


fcsl 


We  reformulate  this  as  a  linear  programming  problem  and  use  linear  programming  theory  to  show 
that  a  small  number  of  different  7*  ’s  suffices. 


Let  7  be  the  set  of  ail  finite  subsets  of  I*.  For  any  F  €  7%  let 


A (F)  =  min  max  min  £  7,*), 

*7  •#*}  •€[£>, lj 


ne/1 


where  the  minimization  with  respect  to  x7  is  subject  to  the  constraints 


>0,  V7  €  F, 


(2a) 


51  *-r  =  !• 
is/1 


(26) 


Let 


A*  =  inf  A(F). 

/■ey  v  ' 


Let  us  fix  AT  and  some  collection  7*  €  TN  of  decision  rules.  Let  a  =  min*  P(Hi).  We  then  have, 
using  part  (b)  of  Lemma  1, 


Jn(  1N)  =  £  P(decide  Hi  |  Hj)P(Hj)  > 

{(•.»•  W> 


—  max  exp 
2  {{*,])■  »/i> 


Y1  (7*,  »;,■)  -  f 2 IZ  m",  (7k,  <,■)) 
*=1  V  k=l  / 


1/Z 


where  3*;  minimizes  CfcLi  M»j  (7k>«)  OVKT  *  €  (0,1).  Let  F  be  the  set  of  different  decision  rules 
(elements  of  T)  which  are  present  in  the  collection  7^  of  decision  rules.  For  each  7  6  F,  let  x7  be 
the  proportion  of  the  sensors  using  decision  rule  7;  that  is  x7  is  equal  to  the  number  of  Jb’s  such 
that  7k  =  7,  divided  by  N.  By  construction,  the  coefficients  xn  satisfy  the  constraints  (2a-2b). 
Using  Assumption  lb  to  bound  /ij' (7 k»  3i} ) >  the  definition  of  and  the  definition  of  A(F),  we  have 


JNhN)  >  —  exp  I  max  min 

K  '  2  K  \  .€[0,1] 


N  ( 7,«) 


-  (2 NA)l/i  > 


*eN\(F)-{7NA)1'9  >  ®eiVA•-(lJV<4),',* 
2  “  2 


This  shows  that  Rn  >  A*  -  (2 A/N)1^  +  -fo  log(a/2).  Taking  the  limit  as  IV  —*  00,  we  obtain 


lim  inf  Rn  >  A*. 

N  — *00 


(3) 


Lemma  2:  A*  =  inf  A  (F),  where  7q  is  the  collection  of  all  subsets  of  T  of  cardinality  no  larger 

than  M{M  -  l)/2. 

Proof:  Given  some  F  €  7 ,  let  *?y,  z*  be  such  that  the  constraints  (2a),  (2b)  are  satisfied  and 

HF)  =  „.*!“* ..  Y 

(Such  sj z*  exist  because  the  quantity  max^y),  X^gF  *7M»i('T»*»j)  is  continuous  in  sx], 
x1  and  is  defined  over  a  compact  set;  therefore,  the  minimum  arising  in  the  definition  of  A (F) 
is  attained.)  In  particular,  if  the  s^’s  are  fixed,  then  the  z*’s  are  determined  by  minimizing 
max{(ij);  >  subject  to  the  constraints  (2a)-(2b).  This  minimization  is  equiv¬ 

alent  to  the  following  linear  programming  problem: 

min  A 


subject  to 

X^Y  W ,j,  *  ^  j, 

1 t€F 

x7>o,  v7eF, 

Y  *•»  =  L 

7  €F 

Let  T  be  the  cardinality  of  the  set  F.  The  above  defined  linear  program  has  T  +  1  variables  and 
T+l  +  Af(Af- 1)/2  constraints.  From  linear  programming  theory  [PS],  we  know  that  there  exists  an 
optimal  solution  at  which  the  number  of  constraints  for  which  equality  holds,  is  no  smaller  than  the 
number  of  variables.  Therefore,  with  this  optimal  solution  at  most  Af(M  -  l)/2  of  the  constraints 
hold  with  a  strict  inequality,  which  implies  that  at  most  M(M  -  l)/2  of  the  z7’s  are  nonzero. 
Therefore,  for  any  F  €  7  there  exists  some  F1  €  To  such  that  A(F')  <  A(F)  This  completes  the 
proof  of  Lemma  2.  • 

Let  us  fix  some  N  and  some  <  >  0.  Let  F  be  a  subset  of  T  of  cardinality  no  larger  than 
M{M  -  l)/2  (that  is,  F  €  To),  such  that  A (F)  <  A*  +  e,  which  exists,  because  of  Lemma  2.  Let 
z*,  and  be  such  that 

..  Y  *7 Mij(7,»<y)  =  A(F)  <  A*  +  c. 

IK:); 


We  now  define  a  collection  ~tN  of  decision  rules  to  be  used  by  the  N  sensors:  for  each  7  €  F,  we 
let  exactly  [JVx*  J  of  them  use  the  decision  rule  7;  if  there  are  any  remaining  sensors,  which  is  the 
case  if  Wx*  is  not  an  integer  for  some  7,  we  let  these  sensors  use  an  arbitrary  decision  rule  out  of 
the  set  F.  Let  No  be  the  number  of  these  remaining  sensors. 

We  now  estimate  the  probability  of  error  under  this  particular  7^.  The  probability  of  error  is 
bounded  above  by  the  probability  of  error  for  the  case  where  the  fusion  center  chooses  to  ignore 
the  messages  transmitted  by  the  last  No  sensors  and  this  is  what  we  will  assume.  We  now  have 

Jn(iN)  <  ^2  /’(decide  Hi  |  Hj  is  true)P(Py)  < 

Af3  max  [P(decide  Hi  \  Hj  is  true)  +  P(decide  Hj  \  Hi  is  true)].  (4) 

i> 

The  expression  inside  the  brackets  in  the  right  hand  side  of  (4)  refers  to  the  probabilities  of  error 
for  a  context  in  which  Hi  and  Hj  are  the  only  hypotheses.  Since  the  fusion  center  uses  the  MAP 
rule,  it  is  using  a  decision  rule  which  would  be  optimal  even  if  it  had  to  discriminate  only  between 
the  two  hypotheses  Hx  and  Hj  (always  assuming  that  the  last  N0  messages  are  ignored).  Thus,  for 
each  pair  of  hypotheses,  the  upper  bound  on  the  probability  of  error  furnished  by  Lemma  1(a)  is 
applicable.  This  yields 


Jn (7*)  <  2Af3  max  exp  [N x\\ mj(l,  »lj)  •  (5) 

We  now  use  the  inequality  Nx *  —  [iVx*  J  <  1  to  obtain 

5^Lat*;j^,(7,«;,)  <  2  ^*;^,(7,«v)  +  J2  Ip*j(7i<#)I  <  ^2  tf*V*j(7,*«)  +  k, 

-l€F  -»€/•  l€F  -t€F 

where  K  is  a  constant  independent  of  N.  We  substitute  the  above  inequality  in  the  right  hand  side 

of  (5),  then  take  logarithms  and  divide  by  N  to  obtain 

„  ^  log/jv(7^)  ^  21ogAf  t  log K  _  _  _  ^  .  1  »•  ,  ,  K' 

Qn  ~  N  ~  N  +  N  A  N' 

where  K1  is  another  constant  independent  of  N.  We  take  the  limit  as  N  — *  oo  and  use  the  fact 
that  €  was  arbitrary  to  conclude  that  limsup^..^  Qn  <  A*.  We  combine  this  inequality  with  (3) 
and  the  obvious  inequality  Rn  <  Qm  to  complete  the  proof  of  the  theorem.  • 


I 


III.  SPECIAL  CASES  AND  COMPUTATIONAL  CONSIDERATIONS. 

Let  us  start  by  stressing  that  the  proof  of  Theorem  I  is  constructive  and  suggests  a  procedure 
for  determining  an  asymptotically  optimal  set  of  decision  rules.  Namely,  we  have  to  solve  the 
optimization  problem  defining  A*.  The  value  of  A*  is  the  optimal  exponent  and  the  associated 
optimal  values  of  the  1-,’s  are  the  proportions  of  the  sensors  who  should  use  each  decision  rule  7. 

Theorem  1  is  most  useful  in  the  case  of  binary  hypotheses  (Af  =  2)  and  binary  messages  [D  =  2). 
For  that  case  it  is  known  [TS]  that,  without  any  loss  of  optimality,  we  may  assume  that  each  sensor 
decides  what  to  transmit  by  performing  a  likelihood  ratio  test,  with  an  appropriate  threshold.  We 
thus  let  r  be  the  set  of  all  such  decision  rules.  Furthermore,  in  this  case  we  have  M(M  —  l)/2  =  1 
and  Theorem  1  implies  that  it  is  asymptotically  optimal  to  let  every  sensor  use  the  same  threshold. 
In  order  to  compute  A*  we  only  need  to  optimize  over  all  subsets  of  T  of  cardinality  1.  Therefore, 
the  optimal  threshold  may  be  computed  by  solving  the  optimization  problem 

min  min  *»n(7,s).  (6) 

7€r.€lo,il  v  ’  y  ’ 

Notice  that  each  7  €  T  can  be  described  by  a  single  real  number,  the  value  of  the  threshold  being 
employed.  We  are  therefore  dealing  with  a  nonlinear  optimization  problem  in  two  dimensions.  In 
typical  problems,  the  probabilities  p?(d)  are  given  by  simple  analytical  expressions,  as  a  function 
of  the  threshold  corresponding  to  7.  Therefore,  simple  analytical  expressions  are  also  available  for 
Mxj(7.«)  M  well.  It  is  known  that  •)  i*  a  convex  function  of  s,  for  every  7  [SGB],  which 

makes  the  optimization  with  respect  to  s  easier.  Unfortunately,  we  are  not  aware  of  any  simple  but 
nontrivial  examples  in  which  the  solution  of  the  above  optimisation  problem  and  the  corresponding 
value  of  the  optimal  threshoi  _  may  be  obtained  analytically. 

In  the  case  of  binary  hypotheses  (Af  =  2)  and  messages  of  arbitrary  cardinality  D  >  2,  it  is 
known  that  likelihood  ratio  tests  are  again  optimal  except  that  each  decision  rule  consists  of  D  —  1 
thresholds  which  determine  which  one  of  the  D  messages  is  to  be  sent.  The  same  discussion  as 
for  the  case  of  D  =  2  applies  here  and  (asymptotically)  each  sensor  should  use  the  same  set  of 
thresholds.  The  only  difference  is  that  7  is  parametrized  by  a  (D  —  1) -dimensional  real  vector  (as 


opposed  to  a  scalar).  Thus,  the  problem  (6),  which  needs  to  be  solved  in  order  to  determine  the 
optimal  thresholds,  is  a  D-dimensional  optimization  problem.  This  may  become  quite  hard  unless 
D  is  small,  the  reason  being  that,  in  general,  a)  is  not  a  convex  function  of  the  parameters 
specifying  7. 

For  the  case  where  M  >  2,  Theorem  1  is  less  useful  for  computing  an  asymptotically  optimal  set 
of  decision  rules.  The  reason  is  that  we  have  to  perform  an  optimization  problem  over  all  subsets 
of  T  of  cardinality  M(M  -  l)/2.  In  principle,  it  seems  possible  to  reformulate  the  optimization 
problem  defining  A*  in  a  way  that  avoids  having  to  consider  each  such  subset  of  T  (which  would 
be  impossible  anyway  if  f  is  infinite).  Namely,  we  might  perform  the  minimization 


min 

*€/* 


min  /  Mij(7,«)  dx('T)> 
■  p  {(*•])■  *€[0,1 1  Jr 


where  z(-)  is  a  positive  measure  on  T  with  x(r)  =  1  and  where  P  is  the  set  of  all  such  measures. 
Leaving  aside  the  technical  difficulties  in  showing  that  this  is  an  equivalent  problem,  it  still  does 
not  seem  particularly  promising  from  a  computational  point  of  view.  It  appears  that  the  only  cases 
in  which  a  numerical  solution  is  possible  are  those  cases  in  which  the  set  Y  is  finite  and  has  small 
cardinality,  because  in  that  case  T  is  also  finite  and  has  small  cardinality.  Notice  that  if  F\  C  F3, 
then  A(Fj)  <  A(F'l).  Therefore,  if  T  is  finite,  we  have  A*  =  A(T).  This  suggests  that  in  order 
to  compute  A*  it  is  preferable  to  ignore  Theorem  1:  instead  of  computing  \(F )  for  each  F  of 
cardinality  M(M  -  l)/2,  and  then  taking  the  minimum,  we  may  just  compute  A(T). 

An  Example:  Let  M  =  3,  D  =  2  and  let  Y  —  {1,2,3}.  Let  each  hypothesis  be  equally  likely 
and  let  the  statistics  of  the  observation  y  be  as  follows:  conditioned  on  Hi  being  true,  y  takes  the 
value  t  with  probability  1  -  2«  and  takes  each  one  of  the  remaining  two  values  with  probability  e 
(0  <  «  <  1/4).  There  are  three  possible  decision  rules.  The  i-th  possible  decision  rule  is:  ^(y)  =  1  if 
and  only  if  y  =  i.  Notice  that  71  does  not  provide  any  information  useful  in  discriminating  between 
H j  and  #3.  Thus,  mjs(7i>*)  =  0>  Vs;  similarly,  Mu(lf3»»)  —  Mis(7i»«)  =  0,  Vs.  Furthermore, 
by  symmetry,  ^13(71.  ’)  =  Mis(7i»«)  =  ^33(73,*))  etc.  Let  a  be  the  value  of  the  minimum  of 
Pis^ii*)'  over  s  €  [0,1].  Let  x,  be  the  proportion  of  sensors  using  7,.  The  optimal  values  of 


ii,xi,xs  we  determined  by  solving  the  problem 


a  max  {xi  +  xj.ii  +  x3,xj  +  xs}, 

*l.*S.*S 

over  the  unit  simplex.  It  is  easy  to  see  that  the  optimal  solution  is  x\  =  x2  =  X3  =  5,  exactly  as 
expected  from  the  symmetry  of  the  problem,  and  the  corresponding  value  of  the  optimal  exponent 
A*  is  2q/3. 

IV.  ALTERNATIVE  INTERPRETATIONS. 

Theorem  1  may  be  restated  in  a  different  language  refering  to  a  different  context.  For  simplicity, 
we  only  consider  the  case  M  —  2.  Suppose  that  we  want  to  transmit  a  binary  message  and  that  we 
have  a  collection  of  noisy,  memory  less  and  independent  channels  in  our  disposal.  We  are  allowed 
to  transmit  a  total  of  N  times  using  any  of  the  available  channels  each  time.  A  receiver  observes 
the  N  outputs  of  the  channels,  uses  its  knowledge  of  which  channels  were  being  used,  and  makes  a 
decision  on  what  was  transmitted.  The  problem  consists  of  finding  which  channels  should  be  used 
and  how  many  times  each,  in  order  to  maximise  the  probability  of  correct  decoding.  For  small 
N ,  it  may  be  better  to  use  a  different  channel  each  time,  even  if  the  original  message  is  binary. 
However,  our  result  states  that,  for  binary  messages,  as  N  — ♦  00,  there  is  a  single  best  channel 
which  should  be  used  for  all  transmissions.  To  see  the  analogy,  think  of  the  hypothesis  Hi  or  H j  as 
the  value  of  the  binary  message  which  we  want  to  transmit  and  think  of  u,  as  the  output  of  the  i-th 
transmission.  A  different  channel  corresponds  to  a  different  decision  rule  and  the  characteristics  of 
the  channel  correspond  to  the  quantities  pj(d). 

A  different  analogy  may  be  made  in  the  context  of  optimal  design  of  measurements  for  failure 
detection.  Suppose  that  we  have  a  system  which  may  be  in  one  of  two  states:  up  or  down.  We 
have  a  collection  of  devices  which  may  be  used  for  failure  detection.  They  are,  however,  unreliable 
and  may  make  errors  of  both  types.  Furthermore,  the  probabilities  of  either  type  of  error  can  be 
different  for  different  devices.  Suppose  that,  in  order  to  increase  reliability  we  want  to  use  N  such 
devices.  Then,  our  result  states  that,  as  N  — ►  00,  there  exists  a  single  best  device  and  that  we 
should  use  N  replicas  of  it,  rather  than  using  many  devices  with  different  characteristics. 


In  this  section  we  explore  Assumption  1.  Our  objective  here  is  to  obtain  conditions  on  the 
distributions  Pi  under  which  Assumption  1  can  be  shown  to  hold.  Proposition  1  below  deals  with 


Assumption  1(a). 

Proposition  1:  Assumption  1(a)  fails  to  hold  if  and  only  if  there  are  two  hypotheses  H, ,  HJt  such 
that  the  corresponding  measures  and  Pj  are  mutually  singular.* 

Proof:  Suppose  that  Assumption  1(a)  fails.  Then,  there  exist  some  i,  j  and  some  7  €  T  for  which 
Pi{d)p]{d)  =  0,  Vd  €  {l,...,0}.  Thus,  for  any  d  6  {l,...,Z?},  the  set  (y  €  Y  :  7(y)  =  d }  has 
non-zero  measure  under  Pi  only  if  it  has  zero  measure  under  Pj.  Since  the  sets  (y  €  Y  :  7(y)  =  d) 
cover  the  entire  set  Y,  it  follows  that  Pi  and  Pj  are  mutually  singular.  • 


As  a  consequence  of  Proposition  1,  we  can  see  that  if  there  are  only  two  hypotheses  and  As¬ 
sumption  1(a)  fails  to  hold  we  are  dealing  with  the  uninteresting  situation  where  each  sensor  is  able 
to  determine  the  true  hypothesis  on  its  own,  with  zero  probability  of  error.  For  the  case  of  more 
than  two  hypotheses,  however,  there  are  nontrivial  detection  problems  in  which  Assumption  la 
fails  to  hold.  We  conjecture  that  a  somewhat  modified  version  of  Theorem  1,  covering  such  a  case, 
is  possible.  We  now  explore  Assumption  1(b)  and  show  that  it  holds  for  two  interesting  situations. 
Proposition  2:  Suppose  that  the  observation  set  Y  is  finite  and  that  Aseumption  1(a)  holds. 
Then  Assumption  1(b)  also  holds. 

Proof:  The  derivatives  of  M»y(lfi*)i  with  respect  to  s  are  easily  calculated  to  be  [SGB,  equations 
(3.24)— (3.25)] : 


M'ijil,*) 


A  (p7(d))x-(p?(<*))‘  p]W 

Z?=l(pl(c)V-(p]{e)y  ogp.W 


(7) 


M"(7,«)  = 


f  f  p]wV 

SI  E‘,(f>)),-(p’(«))1  v  °‘  riw) 


~  K,(7,»)J*. 


(8) 


where  all  summations  are  made  over  those  c’s  and  d’s  for  which  (c)p'J(c),  (respectively,  pi  (d)pl(d)  )| 
is  nonzero. 


*  Two  positive  measures  Pi,  Pj,  defined  on  a  common  (measurable)  space  Y  are  called  mutually 
singular  if  there  exists  a  measurable  subset  U  of  Y  such  that  Pi(U)  =  P\(Y  -  U)  =  0. 


Let  a  be  the  minimum  of  pj(c),  where  the  minimum  is  taken  over  all  choices  of  7,  c,  t,  such  that 
p^{c)  >  0.  Since  Y  is  finite,  the  set  of  all  possible  decision  rules  7  is  also  finite  and  therefore  a 
is  the  minimum  of  finitely  many  positive  quantities  and  is  itself  positive.  By  Assumption  1(a)  the 
denominator  in  equation  (7)  must  have  a  nonzero  summand  and  this  summand  will  be  bounded 
below  by  a1~*a *  =  a.  The  numerator  is  bounded  by  D.  Concerning  the  logarithmic  term,  it  is 
bounded,  in  absolute  value,  by  |  loga|,  for  any  d  in  the  range  of  the  summation.  We  conclude  that 
p'ij  (7 1  *)  is  bounded  in  absolute  value  by  a  constant  independent  of  i,  j,  7,  s.  A  similar  argument 
applies  to  /i"(7,  s)  and  concludes  the  proof.  • 

Proposition  3:  Suppose  that,  for  any  i,  j,  the  measure  P,  is  absolutely  continuous  with  respect 
to  Pj  and  let  LtJ  denote  the  Radon-Nikodym  derivative  dPt/dPr  Assume  that 

£,(log*  L%)\  <  00,  Vi,ji.  (9) 

Then  Assumption  1  holds. 

Proof:  The  fact  that  Assumption  1(a)  holds  is  immediate  from  our  assumption  of  absolute  conti¬ 
nuity  and  Proposition  1. 

For  any  decision  rule  7  :  Y  •-*  {l,  ...,D},  let  7 1  be  the  smallest  a-field  contained  in  7  with 
respect  to  which  the  function  7  is  measurable.  Let  denote  the  restriction  of  the  measure  P%  on 
the  s'-field  7"1 .  It  follows  from  the  absolute  continuity  assumption  that  P?  is  absolutely  continuous 
with  respect  to  Pj  We  define  to  be  equal  to  the  Radon-Nikodym  derivative  dP? ,'dPj  As  is 
well  known 

L:,  =  E,{L%,\7-1  a  s.  [P])  (10) 

Consider  the  function  #  :  (0,oo)  — *  (0,oo)  defined  by  #(t)  =  (log1!.  An  easy  calculation  shows 
that  it  is  convex.  Therefore,  using  (10)  and  Jensen’s  inequality, 

E* [log*  l;}\  =  E,\L1}  log1  l:}]  =  E, [4(l:})\  =  E,i+(E,lL%,  \  r»])|  < 

E,[E,{« L*,  i  r\\  =  E,[Lt,  log*  l%,\  =  E.Ilog1  L%,\ 


Using  (9),  we  conclude  that  there  exists  a  constant  B  <  00  such  that  £,[log  *l;,;  <  b,  v7,,.j 
using  the  inequality  £[|xl]  <  1  £[x*:,  we  obtain  the  same  conclusion  for  £,[log  L, 


Notice  now  that  £7.(y)  =  Pi(d)/pJ(d),  for  every  y  such  that  7(y)  =  d ,  almost  surely.  Using  this 
observation,  equation  (7)  may  be  rewritten  as 


,h,  )  ’ 


in) 


similarly,  equation  (8)  becomes 


^[(£,7)*)  [m»j(7>*)1  • 


(12) 


Using  the  obvious  inequality  (LT)*  <  (1  +  L]t),  V*  €  (0, 1),  we  obtain  the  bound 

ji+i*i 

mi) 


,  X|  ^  |£»pog £7.j|  +  1^(17. log Ll\\  |«(log2£]|  +  |^.(logZiTJ| 


We  have  already  proved  that  the  numerator  is  bounded.  We  now  establish  a  lower  bound  on 
£.[(£?,)*].  Since  E%[L]%\  =  1,  it  follows  that  there  exists  a  /-measurable  set  Y0  C  Y  and  some 
e  >  0,  6  >  0,  such  that  Pt[Y0)  >  <  and  L,i( y)  >  6,  Vy  €  Vo-  Since  x*  >  min{l,x},  we  obtain 
E*[Lj%]  >  <min{l,5},  Vs  €  [0, 1].  We  now  use  the  fact  that  the  function  ^(x)  =  x*  is  concave,  for 
any  fixed  s  €  [0, 1],  and  Jensen’s  inequality  to  obtain 


w;.)*]  =  *[(*[!„  inn  >  wiijjhi  =  mu  >  <min{i  ,*>. 

This  concludes  the  proof  that  p'(ir,«)  is  bounded.  The  proof  of  the  boundedness  of  ^"(7,  s)  is 
identical  and  is  omitted.  • 


YL  THE  NEYMAN-PEARSON  PROBLEM. 

In  this  section  we  consider  the  Ney man-  Pearson  version  of  the  problem  studied  in  the  preceding 
sections.  We  are  given  an  observation  set  Y,  endowed  with  a  9-field  7.  There  are  two  hypotheses 
(M  =  2)  and  for  each  hypothesis  we  are  given  a  measure  P%  on  (V,  7),  i  =  1,2.  Let  D  be  a  fixed 
positive  integer  and  let  T  be  the  set  of  all  measurable  functions  7  :  Y  •—  (1, ....  Dj  As  before,  the 
i-th  sensor  makes  an  independent  observation  y,  whose  statistics  are  described  by  P},  assuming 
that  hypothesis  ff}  is  true.  Again,  the  i-th  sensor  transmits  a  message  7,(y»)  to  a  fusion  center, 
where  7,  €  I*,  and  finally  the  fusion  center  makes  a  final  decision  using  a  decision  rule  70.  We  allow 


7o  to  be  randomized.  That  is,  the  final  decision  of  the  fusion  center  may  depend  on  the  messages  it 
has  received  as  well  as  an  internally  generated  random  variable.  Let  To  be  the  set  of  all  candidate 
decision  rules  70  for  the  fusion  center. 

For  any  given  (7o,7i,--,7jv)  €  Tq  x  consider  the  probabilities  of  error  defined  by 


^at('To,7i>— »7/v)  =  -Pi(7o(7(yi),--'7(yAr))  =  2),  (13) 

^at(7o,7i.-»7jv)  =  -Pj(7o(7(yi).‘--7(y;v))  =  1)-  (14) 

Let  us  fix  a  constant  0  belonging  to  (0,1).  We  would  like  to  minimize  -..,7/v),  over  all 

7o,  -,7iV  satisfying 

^(7o,7i,-",7Ar)  <  1  ~ (15) 
The  optimal  value  of  Jk  falls  exponentially  with  N  and  we  define 

rJv(7o,  •  *  •  >7w)  =  log^(7o - In)- 


Let 


Rn  =  infr/v(7o,-”,7Ar), 


(16) 


where  the  infimum  is  taken  over  all  (70,  •  •  • ,  7 n)  €  To  x  T1*  satisfying  (15).  We  will  use  the  following 
assumption: 

Assumption  2:  a)  Pj  is  absolutely  continuous  with  respect  to  P\ ; 

b) 

£,[|08,(^)]=A<00'  (17) 

where  dPj/dPl  is  the  Radon-Nikodym  derivative  of  the  two  measures. 

We  define  T1  and  Pj  as  in  Section  V:  T1  is  the  ff-field  on  Y  generated  by  7  and  P^  is  the 
measure  P,  restricted  to  71 .  The  argument  in  the  proof  of  Proposition  3,  in  Section  V,  applies 
here  and  shows  that  £j[log*(d/^'/dPj')]  <  A, V7  €  I\  The  latter  inequality  also  implies  that  there 
exists  some  B  <  00  sucb  that 


«h)  =  E,  [loj^Sl  <S,  Visr. 


(17) 


The  quantity  if  (7)  defined  by  equation  (18)  may  be  recognized  as  the  Kullback-Liebler  [KL]  infor¬ 
mation  distance  between  the  distributions  of  the  random  variable  7(y)  under  the  two  alternative 
hypotheses.  It  is  guaranteed  to  be  nonnegative.  Furthermore,  Stein’s  Lemma  [Bj  states  that  K( 7) 
is  the  asymptotic  error  exponent  if  all  sensors  are  using  the  same  decision  rule  7  and  if  the  fusion 
center  chooses  70,  according  to  the  Neyman-Pearson  Lemma.  In  light  of  this  fact,  the  following 
result  should  be  expected. 

Theorem  2:  If  Assumption  2  holds,  then 

(i)  lim/v-too  Rff  =  -  sup,^  if  (7). 

(ii)  The  value  of  Rs  stays  the  same  if  in  the  definition  (16)  we  impose  the  additional  constraint 
71  =  ...  =  7  ff. 

Proof:  (Outline)  Fix  some  e  >  0  and  let  7*  €  T  be  such  that  /f  (7*)  >  sup^er  K( 7)  -  e. 

Let  the  fusion  center  choose  70  optimally,  subject  to  (15).  From  Stein’s  Lemma,  we  obtain 
limw_00  (70,7  V  ••, 7*)  =  -#(7*)-  In  particular,  limsup*..*,  RN  <  -K(Y)  <  -  supper  ^ (tO+I 
e.  Since  e  was  arbitrary,  we  conclude  that  limsup;y_00  Rn  <  -  supl€r  K{ 7)  and  we  have  shown 
this  bound  to  be  valid  under  the  additional  constraint  71  - - in- 

In  order  to  complete  the  proof,  it  is  sufficient  to  show  that  for  any  70,  ...,7*  satisfying  (15)  we 
have 

1  N 

'silo,'”, 1ft)  >  -vE^)  +  ^)  -  -»»Pff(7 )  +  /(W),  (19) 

where  /  is  a  function  with  the  property  limjv-.oo  f{N)  =  0  and  which  does  not  depend  on 
7o>  •  ■  •  ,7 N-  While  this  result  does  not  follow  from  the  usual  formulation  of  Stein’s  Lemma  (which 
uses  the  Assumption  7i  =  •  •  •  =  7*),  it  may  be  proved  by  a  small  variation  of  the  proof  of  that 
Lemma,  and  for  this  reason  the  proof  is  omitted.  Suffice  to  say  that  we  may  take  the  proof  of 
Stein’s  Lemma  given  in  [Bj.  Wherever  in  that  a  proof  convergence  in  probability  of  a  log-likelihood 
ratio  to  its  mean  is  asserted,  we  replace  such  a  statement  with  an  inequality  which  bounds  the 
probability  of  a  deviation  of  a  log-likelihood  ratio  from  its  mean.  Such  an  inequality  is  obtained 
from  Chebychev’s  inequality.  Because  of  (17)  the  variance  of  the  log-likelihoods  of  interest  admits 
the  same  bound,  irrespective  of  the  choice  of  the  7,'s.  For  this  reason,  the  function  /  in  (19)  may 
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b«  taken  independent  of  the  7’s.  The  proof  ia  then  completed  by  taking  the  infimum  of  both  sides 
of  (19),  over  all  70 1  ■■■,1s  and  then  letting  N  tend  to  infinity.  • 

We  continue  with  a  few  obeervations.  For  simplicity  we  restrict  our  discussion  to  the  case  of 
binary  messages  (D  =  2). 

It  is  easy  to  prove  that  there  is  no  loss  of  optimality  if  we  constrain  the  7i’s  to  correspond  to 
likelihood  ratio  testa  [HVJ.  If  we  are  only  interested  in  asymptotics,  the  same  conclusion  may  be 
obtained  from  Theorem  2:  it  is  not  hard  to  show  that  if  a  decision  rule  does  not  have  the  form  of 
a  likelihood  ratio  test,  then  another  decision  rule  can  be  found  for  which  /f  (7)  is  even  larger.  This 
leads  to  the  conclusion  that  asymptotically  optimality  is  not  lost  by  assuming  that  each  7,  consists 
of  a  comparison  of  the  likelihood  ratio  computed  by  that  sensor  with  a  threshold. 

As  is  well-known,  randomisation  is  generally  required  in  optimal  hypothesis  testing,  under  the 
Ney man- Pearson  formulation.  For  this  reason,  we  allowed  the  decision  rule  of  the  fusion  center  to 
employ  an  internally  generated  random  variable.  We  may  ask  whether  anything  can  be  gained  by 
allowing  the  sensors  as  well  to  use  randomised  decision  rules.  The  answer  is  generally  positive.  For 
example,  if  N  =  1,  then  the  best  strategy  is  to  let  the  single  sensor  perform  an  optimal  Neym&n- 
Pearson  teat  (for  which  randomisation  is  needed)  and  have  the  fusion  center  adopt  the  decision  of 
the  sensor.  Interestingly  enough,  however,  randomisation  does  not  help  asymptotically  as  JV  — *  00, 
which  we  now  prove.  For  any  two  measures  P,  Q  on  (Y,T),  let  K[Q,P)  =  E\\og(dQ/dP)},  where 
the  expectation  is  with  respect  to  Q.  With  this  notation,  K{l)  =  K(Pj ,P^),  V7  €  T.  It  is  known, 
and  easy  to  show,  that  K{Q,  P)  is  a  convex  function  of  (Q,  P).  Suppose  now  that  a  sensor  uses 
a  decision  rule  which  involves  randomisation.  The  pair  (PJ ,  P?)  of  the  probability  distributions 
of  the  message  transmitted  by  a  sensor  using  a  randomised  decision  rule  7  lies  in  the  convex  hull 
of  such  pairs  of  probability  distributions  corresponding  to  non-randomised  decision  rules.  Using 
the  convexity  pf  K,  it  follows  that  randomisation  cannot  help  in  increasing  the  supremum  of  K(  7) 
and,  therefore,  does  not  help  asymptotically. 

From  a  computational  point  of  view,  the  problem  of  this  section  is  a  little  easier  from  the  problem 
of  Section  II,  the  reason  being  that  we  do  not  have  the  additional  free  parameter  •  of  Section  II. 
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In  particular,  with  decision  rules  parametrised  by  a  scalar  threshold,  maximisation  of  K (7)  is 
equivalent  to  a  one-dimensional  optimisation  problem.  As  there  may  be  multiple  local  optima, 
some  form  of  exhaustive  search  may  be  required. 

As  an  illustration,  we  study  the  performance  of  a  naive  selection  of  the  decision  rule  7  of  each 
sensor.  We  let  each  sensor  perform  a  maximum  likelihood  test  and  transmit  its  decision  to  the 
fusion  center.  This  is  certainly  a  bad  idea  if  N  =  1  because  in  that  the  case  the  sensor  should 
perform  a  Neyman-Pearson  test  which  is,  generally,  different  from  a  maximum  likelihood  test.  Still, 
one  may  wonder  whether  such  a  naive  prescription  has  any  performance  guarantees,  as  N  — *  00. 
The  answer  is  negative,  as  the  following  example  shows.  Let  and  Pa  be  as  in  Figure  1.  A  decision 
rule  7  corresponding  to  a  maximum  likelihood  test  is  to  let  7(y)  =  1  if  and  only  if  y  >  1/2.  For 
this  choice  of  7,  if  we  assume  that  c  is  small  enough  and  use  a  Taylor  series  expansion  we  obtain 

where  A  is  some  positive  constant.  Let  us  now  consider  the  decision  rule  7  given  by  7(y)  =  1  if 
and  only  if  y  >  1.  We  then  have  if  (7)  =  log(l/(l  -  c/2))  >  c/2  +  Be *,  for  some  constant  B.  We 
conclude  from  this  example  that  the  naive  decision  rule  suggested  above  can  be  far  from  optimal 
(in  terms  of  error  exponent)  by  an  arbitrary  multiplicative  factor. 
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APPENDIX 

We  consider  here  the  problem  introduced  in  Section  II,  with  two  hypotheses  (Af  =  2),  binary 
messages  ( D  =  2),  two  sensors  (N  =  2),  and  with  yi,  yj  identically  distributed  and  conditionally 
independent  given  either  hypothesis.  We  present  an  example  which  shows  that  it  is  possible  that 
different  sensors  may  have  to  use  different  decision  rules  even  if  their  observations  are  identically 
distributed.  An  example  of  this  type  was  presented  in  [TeSa],  However,  that  example  used  a  special 
cost  function  which  introduced  a  large  penalty  if  both  sensors  send  the  same  message  and  the  wrong 
decision  is  made  by  the  fusion  center.  Naturally,  this  creates  an  incentive  for  the  sensors  to  try 
to  transmit  different  messages,  and  therefore  use  different  decision  rules.  Thus,  the  asymmetry  of 
the  optimal  decision  rules  of  the  two  sensors  can  be  ascribed  to  this  particular  aspect  of  the  cost 
function  and  does  not  prove  that  asymmetrical  decision  rules  may  be  optimal  for  our  cost  function 
(probability  of  error). 


Our  example  is  the  following.  We  let  Hi  and  be  equally  likely.  The  observations  yj,  yj  are 


conditionally  independent,  given  either  hypothesis,  take  values  in  {1,2,3}  and  have  the  following 
common  distribution: 

P(y  =  l|ffO  =  4/5,  P(y  =  2\Hi)  =  1/5  P(y  =  3|^)  =  0, 

P(y  =  1|  JEfa)  =  1/3,  P(y  =  2|ffa)  =  1/3  P(y  =  3|J5Ta)  =  1/3. 

An  optimal  set  of  decision  rules  may  be  found  by  exhaustive  enumeration.  Since  each  sensor  has 
to  perform  a  likelihood  ratio  test,  there  are  only  two  candidate  decision  rules  for  each  sensor: 

(A)  xii  =  1  iff  yi  =  1, 

(B)  =  1  iff  yi  €  {1,2}. 

Thus,  we  need  to  consider  three  possibilities:  (i)  both  sensors  use  (A);  (ii)  both  sensors  use  (B); 
sensor  1  uses  (A)  and  sensor  2  uses  (B).  Naturally,  we  assume  that  the  fusion  center  is  using  the 
maximum  a  posteriori  probability  rule. 

Explicit  evaluation  of  the  expected  cost  for  each  possibility  shows  that  the  optimal  set  of  decision 
rules  consists  of  one  sensor  using  decision  rule  A,  one  sensor  using  decision  rule  B  and  the  fusion 
center  deciding  Hi  if  and  only  if  uj  =  ua  =  1,  for  an  expected  cost  of  19/90. 


ix; 


